Minnesota Tree Growth Data

Author

Avery Eastman

Code
tree_data <- read.csv("/Users/Avery/github/Lab_2_tree_growth_data/data/doi_10_5061_dryad_18pm5__v20170130/Itter_et_al_EAP16-0589.R1/tree_dat.csv")
head(tree_data)
  treeID standID stand year species age   inc   rad_ib
1      1       1    A1 1960    ABBA   1 0.930 10.78145
2      1       1    A1 1961    ABBA   2 0.950 11.73145
3      1       1    A1 1962    ABBA   3 0.985 12.71645
4      1       1    A1 1963    ABBA   4 0.985 13.70145
5      1       1    A1 1964    ABBA   5 0.715 14.41645
6      1       1    A1 1965    ABBA   6 0.840 15.25645

Question 1: Use glimpse to understand the structure and names of the dataset. Decribe the structure and what you see in the dataset?

Code
library(dplyr)
glimpse(tree_data)
Rows: 131,386
Columns: 8
$ treeID  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ standID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ stand   <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A…
$ year    <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 19…
$ species <chr> "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA"…
$ age     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
$ inc     <dbl> 0.930, 0.950, 0.985, 0.985, 0.715, 0.840, 0.685, 0.940, 1.165,…
$ rad_ib  <dbl> 10.78145, 11.73145, 12.71645, 13.70145, 14.41645, 15.25645, 15…

The dataset is made up of 131,386 rows and 8 variable columns. The 8 variables names are treeID, standID, stand, year, species, age, inc, rad_ib

Question 2: How many records have been made in stand 1?

Code
stand_1 <- filter(tree_data, standID == 1)

979 records have been made in stand 1

Question 3: How many records of the Abies balsamea and Pinus strobus species have been made?

Code
number_of_ABBA_and_PIST <- filter(tree_data, species == "ABBA" | species == "PIST")

17,221 records of the Abies balsamea and Pinus strobus species have been made

Question 4: How many trees are older then 200 years old in the last year of the dataset?

Code
trees_200_old <- filter(tree_data, age > 200, year == 2007)

In 2007, the last year of the dataset, 7 trees are older then 200 years old ### Question 5: What is the oldest tree in the dataset found using slice_max?

Code
slice_max(tree_data, age, n = 1)
  treeID standID stand year species age  inc rad_ib
1     24       2    A2 2007    PIRE 269 0.37 308.84

The oldest tree in the dataset is 269 years old

Question 6: Find the oldest 5 trees recorded in 2001. Use the help docs to understand optional parameters

Code
tree_data |>
  filter(year == 2001) |>
  arrange(desc(age)) |>
  slice_head(n = 5)
  treeID standID stand year species age   inc  rad_ib
1     24       2    A2 2001    PIRE 263 0.210 306.880
2     25       2    A2 2001    PIRE 259 0.280 156.210
3   1595      24    F1 2001    FRNI 212 0.579 156.267
4   1598      24    F1 2001    FRNI 206 0.394 130.251
5   1712      26    F3 2001    FRNI 206 0.168 154.354

Question 7: Using slice_sample, how many trees are in a 30% sample of those recorded in 2002?

Code
trees_in_30 <- tree_data |>
  filter(year == 2002) |>
  slice_sample(prop = 0.3)

687 trees are in a 30% sample of those recorded in 2002

Question 8: Filter all trees in stand 5 in 2007. Sort this subset by descending radius at breast height (rad_ib) and use slice_head() to get the top three trees. Report the tree IDs

Code
tree_data |>
  filter(standID == 5, year == 2007) |>
  arrange(desc(rad_ib)) |>
  slice_head(n = 3)
  treeID standID stand year species age   inc   rad_ib
1    128       5    A6 2007    PIST  82 0.885 238.8850
2    157       5    A6 2007    PIRE  85 0.900 217.8700
3    135       5    A6 2007    PIMA  84 0.110 210.1874

The tree IDs are: 128, 157, 135

Question 9: Reduce your full data.frame to [treeID, stand, year, and radius at breast height]. Filter to only those in stand 3 with records from 2007, and use slice_min to pull the smallest three trees meaured that year.

Code
tree_data |>
  select(treeID, standID, year, rad_ib) |>
  filter(standID == 3, year == 2007) |>
  slice_min(rad_ib, n = 3)
  treeID standID year rad_ib
1     50       3 2007 47.396
2     56       3 2007 48.440
3     36       3 2007 54.925

Question 10: Use select to remove the stand column. Use glimspe to show the dataset.

Code
tree_data |>
  select(! stand) |>
  glimpse()
Rows: 131,386
Columns: 7
$ treeID  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ standID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ year    <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 19…
$ species <chr> "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA"…
$ age     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,…
$ inc     <dbl> 0.930, 0.950, 0.985, 0.985, 0.715, 0.840, 0.685, 0.940, 1.165,…
$ rad_ib  <dbl> 10.78145, 11.73145, 12.71645, 13.70145, 14.41645, 15.25645, 15…

Question 11: Look at the help document for dplyr::select and examine the “Overview of selection features”. Identify an option (there are multiple) that would help select all columns with the string “ID” in the name. Using glimpse to view the remaining dataset

Code
tree_data |>
  select(contains("ID")) |>
  glimpse()
Rows: 131,386
Columns: 2
$ treeID  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ standID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

Question 12: Find a selection pattern that captures all columns with either ‘ID’ or ‘stand’ in the name. Use glimpse to verify the selection.

Code
tree_data |>
  select(contains("ID") | contains("stand")) |>
  glimpse()
Rows: 131,386
Columns: 3
$ treeID  <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ standID <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ stand   <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A…

Question 13: Looking back at the data dictionary, rename rad_inc and inc to include _[unit] in the name. Unlike earlier options, be sure that this renaming is permanent, and stays with your data.frame (e.g. <-). Use glimpse to view your new data.frame.

Code
tree_data_unit <- tree_data |>
  rename(rad_ib_mm = rad_ib, inc_mm = inc) |>
  glimpse()
Rows: 131,386
Columns: 8
$ treeID    <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ standID   <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ stand     <chr> "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", "A1", …
$ year      <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, …
$ species   <chr> "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABBA", "ABB…
$ age       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
$ inc_mm    <dbl> 0.930, 0.950, 0.985, 0.985, 0.715, 0.840, 0.685, 0.940, 1.16…
$ rad_ib_mm <dbl> 10.78145, 11.73145, 12.71645, 13.70145, 14.41645, 15.25645, …

Question 14: A key measurement in forestry in “basal area column”. The metric is computed with the formula:

BA(m2) = 0.00007854⋅DBH^2 Where DBH is the diameter at breast height (cm). Use mutate to compute DBH in centimeters, and BA in m2 (HINT: Make sure rad_ib is in cm prior to computing the diameter!). What is the mean BA_m2 of the species POTR in 2007?

Code
tree_data_unit <- tree_data_unit |>
  mutate(
    rad_ib_cm = rad_ib_mm / 10,
    DBH_cm = 2 * rad_ib_cm,
    BA_m2 = 0.00007854 * DBH_cm^2
  )
Code
tree_data_unit |>
  filter(species == "POTR", year == 2007) |>
  summarise(mean_BA_m2 = mean(BA_m2))
  mean_BA_m2
1 0.03696619

In 2007 the POTR species had a mean basal area of 0.03696619 m2

Question 15: Lets say for the sake of our study, trees are not established until they are 5 years of age. Use if_else to add a boolean column to our dataset called established that is TRUE if the age is greater then 5 and FALSE if less then or equal to five. Once added, use count (see ?count) to determine how many records are from estabilshed trees?

Code
tree_data_unit <- tree_data_unit |>
  mutate(
    established = if_else(age > 5, TRUE, FALSE)
  )
Code
tree_data_unit |>
  count(established == TRUE)
  established == TRUE      n
1               FALSE   8883
2                TRUE 122503

There are 122503 records from established trees

Question 16: Use mutate and case_when to add a new column to you data.frame that classifies each tree into the proper DBH_class. Once done, limit your dataset to the year 2007 and report the number of each class with count.

Code
tree_data_unit <- tree_data_unit |>
  mutate(DBH_class = case_when(
    DBH_cm > 0 & DBH_cm <= 2.5 ~ "seedling",
    DBH_cm > 2.5 & DBH_cm <= 10 ~ "sapling",
    DBH_cm > 10 & DBH_cm <= 30 ~ "pole",
    DBH_cm > 30 ~ "sawlog"
  ))
Code
tree_data_unit |>
  filter(year == 2007) |>
  count(DBH_class)
  DBH_class    n
1      pole 1963
2   sapling  252
3    sawlog   76

In 2007 there were 0 trees classified as seedlings, 252 trees classified as saplings, 1,963 trees classified as pole, and 76 trees classified as sawlog

Question 17: Compute the mean DBH (in cm) and standard deviation of DBH (in cm) for all trees in 2007. Explain the values you found and their statistical meaning.

Code
tree_data_unit |>
  filter(year == 2007) |>
  summarise(
    mean_DBH = mean(DBH_cm, na.rm = TRUE),
    sd_DBH = sd(DBH_cm, na.rm = TRUE)
  )
  mean_DBH   sd_DBH
1 16.09351 6.138643

The mean DBH is 16.09351, this value represents the average diameter at breast height of all the trees in the dataset for the year 2007. The standard deviation of DBH is 6.138643, this value indicates the variability in the DBH values for the trees in 2007. A high standard deviation means that the tree sizes vary widely from the mean, while a low standard deviation suggests that most trees are similar in size

Question 18: Compute the per species mean tree age using only those ages recorded in 2003. Identify the three species with the oldest mean age.

Code
tree_data_unit |>
  filter(year == 2003) |>
  group_by(species) |>
  summarise(mean_age = mean(age, na.rm = TRUE)) |>
  arrange(desc(mean_age)) |>
  slice_head(n = 3)
# A tibble: 3 × 2
  species mean_age
  <chr>      <dbl>
1 THOC       127. 
2 FRNI        83.1
3 PIST        73.3

The 3 species with the oldest mean age are THOC, FRNI, and PIST

Question 19: In a single summarize call, find the number of unique years with records in the data set along with the first and last year recorded?

Code
tree_data_unit |>
  summarise(num_unique_years = n_distinct(year),
    first_year = min(year),
    last_year = max(year)
  )
  num_unique_years first_year last_year
1              111       1897      2007

The number of unique years with records in the dataset is 111, with the first year recorded being 1897 and the last year recorded being 2007

Question 20: Determine the stands with the largest number of unique years recorded. Report all stands with largest (or tied with the largest) temporal record.

Code
stand_unique_years <- tree_data_unit |>
  group_by(standID) |>
  summarise(num_unique_years = n_distinct(year))

summarise(stand_unique_years, max_unique_years = max(num_unique_years))
# A tibble: 1 × 1
  max_unique_years
             <int>
1              111
Code
filter(stand_unique_years, num_unique_years == 111)
# A tibble: 5 × 2
  standID num_unique_years
    <int>            <int>
1       1              111
2      15              111
3      16              111
4      17              111
5      24              111

The stands with the largest number of unique years recorded are 1, 15, 16, 17, and 24

Final question: We are interested in the annual DBH growth rate of each species through time, but we only want to include trees with at least a 10 year growth record. To identify this, we need to identify the per year growth made by each tree, there total growth record, and then average that, and compute the standard deviation, across the species.

Use a combination of dplyr verbs to compute these values and report the 3 species with the fastest growth, and the 3 species with the slowest growth. (** You will need to use either lag() or diff() in your compuation. You can learn more about each in the Help pages)

Lastly, find and include an image of the fastest growing species. Add the image to your images directory.

Code
tree_data_growth <- tree_data_unit |>
  arrange(treeID, year) |>
  group_by(treeID) |>
  mutate(annual_growth = DBH_cm - lag(DBH_cm)) |>
  filter(! is.na(annual_growth))

tree_data_growth <- tree_data_growth |>
  group_by(treeID, species) |>
  summarise(
    total_years = n_distinct(year),
    avg_growth = mean(annual_growth, na.rm = TRUE),
    sd_growth = sd(annual_growth, na.rm = TRUE),
    .groups = "drop"
  ) |>
  filter(total_years >= 10)

species_growth_sum <- tree_data_growth |>
  group_by(species) |>
  summarise(
    avg_species_growth = mean(avg_growth, na.rm = TRUE),
    sd_species_growth = mean(sd_growth, na.rm = TRUE)
  ) |>
  arrange(desc(avg_species_growth))

slice_head(species_growth_sum, n = 3)
# A tibble: 3 × 3
  species avg_species_growth sd_species_growth
  <chr>                <dbl>             <dbl>
1 PIRE                 0.417             0.191
2 PIBA                 0.362             0.217
3 POTR                 0.351             0.170
Code
slice_tail(species_growth_sum, n = 3)
# A tibble: 3 × 3
  species avg_species_growth sd_species_growth
  <chr>                <dbl>             <dbl>
1 QURU                 0.177            0.0688
2 LALA                 0.172            0.0962
3 THOC                 0.154            0.0674

The 3 species with the fastest growth are PIRE, PIBA, and POTR. On the other hand, the 3 species with the slowest growth are QURU, LALA, and THOC

Here is an image of each the fastest growing species:

PIBA

PIRE

POTR